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Abstract 

We define an entropy based on a chosen governing probability distribution. If a 
certain kind of measurements follow such a distribution it also gives us a suitable 
scale to study it with. This scale will appear as a link function that is applied to the 
measurements. A link function can also be used to define an alternative structure on 
a set. We will see that generalized entropies are equivalent to using a different scale 
for the phenomenon that is studied compared to the scale the measurements arrive 
on. An extensive measurement scale is here a scale for which measurements fulfill a 
memoryless property. We conclude that the alternative algebraic structure defined 
by the link function must be used if we continue to work on the original scale. 
We derive Tsallis entropy by using a generalized log-logistic governing distribution. 
Typical applications of Tsallis entropy are related to phenomena with power-law 
behaviour. 
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1 Introduction 

1.1 Entropies 

There are many different ways to describe entropy. One common way is to view 
it as a measure of uncertainty. Formally it is usually defined as the expectation 
of a function that depends only on an event's probability, i.e. we take the ex- 
pectation of a function V defined on (0, 1]. In information theory, the function 
is often described as measuring the information content of the event. For the 
classical Shannon-Boltzman-Gibbs entropy (SBG entropy) V{p) = log^. V is 
then, among other things, non-negative, decreasing, has range [0, oo) and 



The last property is called extensivity and is a critical assumption that char- 
acterizes SBG entropy. In physics, the extensivity property is motivated by 
the argument that if we have two independent systems, then the total energy 
of the combined system should be the sum of the energies of the separate 
systems. In information theory, the function V is thought of as an idealized 
storage need. An efficient storage system is based on a coding scheme where 
common events have short descriptions (few bits) and rarer events demand 
more space. For a fixed coding scheme, the extensivity of the idealized storage 
need is logical. If we take a finance perspective on V, we could view it as the 
value/cost /price of something, e.g. the cost of a claim (or perhaps the total 
claims for a quarter) for an insurance company. Again, adding up independent 
costs is very reasonable. 

The extensivity property is an assumption that together with some regularity 
assumptions leads to SBG entropy, see Jaynes dH) for details. However, during 
the last decades there has been much interest in other entropies that are not 
extensive (0). For Tsallis entropy (0) Sg, instead of Eq. [T]we have the relation 
Sg{A,B) = Sg{A) + Sg{B) + (l " q)Sg{A)Sg{B) ioT indepcudent events A 
and B. Tsallis entropy is defined by using a deformed logarithm log^ and 
is, therefore, an example of a larger class of entropies that are defined from 
deformed logarithms. If : [0, oo) [0, oo) is a strictly positive and non- 
decreasing function on (0, oo), then log^, is defined by 



If this integral converges for all finite p > 0, then log^ is a deformed logarithm. 

Tsallis entropy, corresponding to (f){p) = p'', has been successfully applied 
to many areas of science and in particular to situations where heavy tailed 






(2) 
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power-law behaviour is encountered. For a density / it is defined by Sq{f) = 

f f(x)idx—l 

- — . Despite the practical successes, there is much debate about the 

foundations for using non-extensive entropies (0; 0; 0; 0)- 

An entropy is used in modeling to choose one among all the probability distri- 
butions that satisfy certain constraints encoding available information. Jaynes, 
the founder of the principle of maximum entropy as a principle for statistical 
inference, explained it as choosing the least biased among the distributions, i.e. 
the distribution that did not include any further assumptions (0). The entropy 
itself, however, is introducing an extra assumption since not every distribu- 
tion can arise through maximizing SBG entropy subject to constraints. The 
distributions that are compatible with SBG entropy constitute the exponen- 
tial family. Those distributions have Cumulative Distribution Function (CDF) 
of the form dF{x\r]) = '^^^^~^^^^dG{x) where G is the reference measure, 
T = (Ti, T„) is a sufficient statistic, r] = rjn) is the natural parameter 

and A{ri) is needed for normalization. In the case when the reference measure 
is the Lebesgue measure on R and we only use one constraint E{g{x)) = rj, we 
have a family of densities of the form Cxe^^^^\ Given a density / with finite 
SBG entropy we can let g{x) = log/(x) and thereby define a one parameter 
family of densities of the form Cxf{x)^. Letting r] = J f{x)log{f{x))dx im- 
plies A = 1. The expectation of g with respect to another density h is the cross 
entropy between h and /. 

Although densities are always assumed to be integrable, non-negative and 
to have integral one, the space where the SBG entropy is finite is a proper 
subspace of Li(]R). For q ^ 1, the Tsallis entropy for a density / is finite if 
/ G Lg(R). Finiteness of SBG entropy for a density / can be seen as an extra 
regularity condition which coincides with the condition that is necessary to 
assure that the density's Hardy-Littlewood maximal function 

(M/)(x) = sup \ [ \f{y)my) 

r>0 fl{Br{Xjj JBr(x) 

is a member of Li(M). Br{x) = {y \ \x — y\ < r}. For every g > 1, M is a 
bounded sublinear operator from Lg{R'^) to itself. This result of Hardy and 
Littlewood implies Lebesgue's differentiation Theorem regarding differenti- 
ating integrals of integrable functions and Rademacher's Theorem regarding 
differentiating Lipschitz functions. The maximal operator plays a central role 
in several areas of modern mathematical analysis including real analysis, har- 
monic analysis and functional analysis (0). A functional analysis approach 
to entropy is to use the characteristic function, V{p) = ||/a||x where I a is 
the indicator of A with measure p, of a rearrangement invariant (r.i.) normed 
function space X. The r.i. property implies that ||/a||x only depends on p. 
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1.2 The Approach of This Article 



The purpose of this article is to provide a theoretical approach to defining 
entropies and probability theory based foundation for using them as well as 
a method for choosing an entropy that is suitable for a particular modeling 
task. We reach the conclusion that this is equivalent to finding an appropriate 
scale for a certain kind of observations. 

To understand why a function V that does not satisfy the extensivity assump- 
tion [1] can yet be sensible, we consider the task of modeling dramatic rainfall. 
Expressions like "10- year storm" often occur when describing such events. If 
we have 1/10 probability of having a storm with rainfall amount 1^(1/10) in 
any given year then, assuming independence, we would have a 1/100 probabil- 
ity of having two of those storms. The total rainfall of two such storms would 
be 2V^(1/10), which certainly does not have to be the same as the rainfall en- 
countered during a "100- year storm" which is what 1^(1/100) would denote. 
This distinction between two independent events with probability 1/10 and 
one event with probability 1/100 is not noticed by the SBG entropy which 
is extensive. The extensivity that characterizes SBG entropy is related to the 
memoryless property of the exponential distribution, i.e. 



Pr{X >a + h) = Pr{X > a)Pr{x > b) (3) 

if and only if the random variable X is exponentially distributed. Note that 
V{p) = log^ implies that V~^{t) = e~*, which is the survival function (Com- 
plementary Cumulative Distribution Function) for the exponential distribu- 
tion with mean 1, i.e. 

V'\t) = Pr{X > t) (4) 

for a random variable X with such a probability distribution. Thus, V{p) is 
the answer to the question: At least how large can you say with probability 
(certainty) p that X will be? This is, in a sense, the information that corre- 
sponds to the probability p. This question can be answered for any probability 
distribution on [0, oo) and we will in this article make that the foundation for 
defining an entropy. 



A key tool to connect a governing distribution to a scale is Ghitany's (llOl ) 
generalized memoryless characterization that can be formulated for any prob- 
ability distribution. Those properties are on the form 



Pr{h{X) >a + b) = Pr{h{X) > a)Pr{h{X) > h) (5) 

for a function h that we will call a link function. We will in this article prove 
that Tsallis entropy corresponds to the generalized log-logistic distributions 
that Ghitany paid special attention to. 



4 



1.3 Scales 



Using a link function is related to changing the scale. If a measurement results 
in X, we are instead really studying h{X) or if we are using a statistic T and 
are estimating a vector 77 of parameters, we would study h{ri'^T{x)) instead of 
n^T{x). 



One famous example of a scale is the Richter scale, which is used to measure 
the strength of an earth quake. Its a base-10 logarithmic scale calculated from 
displacements from zero on a seismometer output. 



When it comes to our rainfall example, the distribution should clearly not be 
memoryless in the ordinary sense. If we have had mm of rain by noon on 
a specific day the probability that we would get x mm after noon is smaller 
than the probability that we would get an additional x mm of rain that day 
if we have had a > mm already. If we change the scale in the correct way, 
we would, however, be able to be memoryless on that scale. 



In the theory of Extensive Measurement (llll : |12| ). scales are functions h : G ^ 
]R_i_ on ordered semigroups with the properties 1. h{x) < h{y) iS x < y and 2. 
h{x y< y) = h{x) + h{y) if x is the group operation and + is the usual addition 
in M_|_. In this article, we will use a chosen h to define a group operation which 
makes h additive instead of the other way around. 



Albert Einstein rescaled measurements of the fundamental extensive quanti- 
ties, e.g. he defined the distance between two points based on how long time 
it would take light to travel from one to the other. In the presence of curved 
space, this do not coincide with the euclidean distance between the points. 



1.4 Outline 



In chapter two we review various approaches to describing the behaviour that 
governs a system and in chapter three we review Ghitany's generalized memo- 
ryless property. Chapter four deals with analogue algebra like g-deformations 
and it describes how a link function can be used to either define a memoryless 
property of a distribution or to deform the elementary operations and chapter 
five is about Tsallis entropy. Chapter six summarizes the general results in 
the form of theorems and chapter seven concludes our discussion of rainfall 
modeling that is used as an example throughout the article. Chapter eight 
contains some concluding remarks and chapter nine is a short summary. 
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2 Governing a System 



2.1 Governing Distributions and Superstatistics 



The idea to use a probab ility distribution to define an entropy has been pro- 
posed recently by Niven (1<t). Niven used a combinatorial approach where he 
returns to the origins of SBG entropy. Consider a gas of n particles in an 
enclosed space that we have divided into cells of equal size. Using the as- 
sumption that the particles will distribute themselves over the cells according 
to a uniform multinomial distribution is the beginning of the argument that 
leads to SBG entropy. Niven considered various other alternatives including 
Polya urn models. 

A Polya urn can be described by considering an urn containing balls of various 
colours, e.g. one ball each of different colours. When a ball is drawn from 
the urn, we note its colour, we put it back and add an extra ball of that colour. 
Thereby we increase the probability of drawing that colour the next time. The 
expected proportions after the next draw always equal the current proportions, 
i.e. the process satisfies the Martingale property. The colours here represent 
different cells and the balls represent particles. If we let the number of drawn 
balls n tend to infinity, we end up with a multinomial since the added ball 
becomes a smaller and smaller proportion of the total number of balls. Which 
multinomial we will end up with is, however, not predetermined. The resulting 
proportions follow a Dirichlet distribution and the Bayesian formulation of a 
Polya urn is to have a Dirichlet prior on the family of multinomials. Therefore, 
it is sometimes called the Dirichlet Compound Multinomial. The parameters 
are the initial number of balls in the urn, e.g. a > of each colour. Polya 
urns have the important property that they are infinitely exchangeable, i.e. 
the probability of a finite sequence of any length does not depend on the order. 

The Dirichlet distributions are the conjugate priors to the multinomials. A 
Dirichlet distribution can also be described as forming a vector of proportions 
by dividing each of A^ independent identically Gamma distributed random 
variables by their sum, i.e. normalizing the vector consisting of the original 
variables. 



This observation leads us to the Beck-Cohen superstatistics ( ll4j ) idea of defin- 
ing a distribution for one cell. They pick a probability density function /(/?) on 
(0, oo) where /3 is the inverse temperature, and then normalize by a generalized 
Boltzmann factor defined by 

/■oo 

B{E) = / fme-^^'df^. (6) 
Jo 
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Tsallis statistics has been derived as a special case of superstatistics by letting 
/ be the density of a Gamma distribution. Superstatistics has been successfully 
applied e.g. to model turbulent systems with temperature fluctuations. In 
superstatistics, a cell has a distribution over temperatures instead of a fixed 
temperature and can, therefore, incorporate fluctuating local temperatures. 
The governing distribution approach used in this article is not identical to the 
superstatistical one, since we work on the other end of the Laplace transform [61 
B{E) is the Laplace transform of the density and from that it follows that 
-B(O) = 1, that B{E) is decreasing and tending to as £" tends to infinity. 
B{E) is, therefore, a survival function and we can find ii^ as a function of 
p = B{E). This recovers our approach. We have a generalized log-logistic 
distribution as our governing distribution for Tsallis statistics. 

Note that Equation [H] employs the Maxwell-Boltzmann distribution, which 
assigns probabilities for the kinetic energy of a particle as a function of the 
temperature. Therefore, defining a probability distribution over temperatures 
is to define probabilities for probabilities of energy which is Bayesian statistics. 
Superstatistics is just a particular way of expressing and understanding the 
prior that is suitable for thermodynamics. 



2.2 Governing Priors 



Instead of beginning with an assumption that particles are distributing them- 
selves over cells according to a uniform multinomial or another predetermined 
multinomial, we can assume that they are distributed according to a multino- 
mial that is drawn from a Dirichlet distribution. We can describe this as having 
a distribution that is a mixture of an infinite number of multinomials. Sam- 
pling from such a distribution can be performed using the Polya urn scheme 
described in the previous section. In that section we mentioned that Polya 
urn models are infinitely exchangeable or in other words, "Bag of Particle 
Models" . According to the De Finetti Theorem, every infinitely exchangeable 
distribution can be represented as a mixture of multinomials, i.e. with a prior. 



2.3 Governing Dynamics 



As discussed recently by Cohen (1l5l). there is a connection between dynamics 
and statistical mechanics. The dynamics describe a particle theory, which is 
used to define the statistics. SBG entropy is based on the differential equation 



(7) 
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whose solution y{x) = can also be written as x = \og{y). To define a 
governing distribution we let 

Pr{X > x) ^ 



y{x) 

If we instead start from 



dx 

with as in [21 the result is 



^ = 0(y), y(0) = l (9) 



x{y) = r -i- ds (10) 



and letting P = ^ and V{p) = x{y) implies that 

V{p)=\og/-). (11) 



If we refer to ?/ = ^ as the surprise and x{y) as the cost of the surprise, 
e.g. in storage demand with the coding scheme that is being used, then the 
differential equation governs how the cost increases with the surprise. 



3 Characterization Of The Generalized Log-Logistic Distribution 



This section is based closely on Ghitany's presentation (jlOl ). The exponential 
distributions have Cumulative Distribution Function (CDF) 

F{t) = 1-e-i, A > 0, t > (12) 

and, therefore, survival function 

F(t) = 1 - F(t) = e"! (13) 

An alternative way of characterizing exponentially distributed random vari- 
ables is by the memoryless property 

Pr{X > a + b) = Pr{X > a)Pr{X > b). (14) 

Other distributions also have similar characterizing properties. If X has CDF 
F, then both F{x) and F{X) are uniformly distributed on the interval (0, 1). 
Therefore, both G{x) = \og{{F{X))-^) and G{X) = \og{{F{X))-^) are expo- 
nentially distributed with A = 1. Thus, as Ghitany shows, F has the charac- 
terizing property 

Pr(log((F(X))^^) >a + b) = Pr{\og{{F{X))-^) > a)Pr{\og{{F{X))-^) > b) 

(15) 
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or we can use F instead of F . If X satisfies [15] there is A > such that 
Pr{G{X) > t) = e~'^*. Ghitany considered two classes of generahzed log- 
logistic distributions, namely GLLDi defined by 

= ^i + (l)^P)m ^ t>0, a,f3,m>0 (16) 

and GLLD2 defined by 

^(^) = ^ - (i + liyy t>0,a,P,n>0. (17) 

The GLLDi characterizing property becomes Pr(log((l-|-(^)~^)™) > a+b) = 

Pr(log(l + > a)Pr(log(l + (-yT) > b) (18) 

a a 

and GLLD2 is, using F, characterized by Pr(log((l + (^)^)"') > a + b) = 

Pr(log(l + i-^r) > a)Pr(log(l + {-fT) > b). (19) 
a a 

The properties do not depend on n and m and, therefore, we can e.g. choose 
m = n = 1. or ^ = n = a. Using m = n = 1, Ghitany provided the elegant 
characterization 

Pr(l + (T/a)-'^ > xy) = Pr(l + {T/a)-^ > x)Pr{l + (T/a)-^ > y), x,y > I 

(20) 

for GLLDi and the following one for GLLD2, 

Pr(l + {T/af > xy) = Pr(l + {T/af > x)Pr{l + {T/af > y), x,y > I. 

(21) 

He also gives the Weibull's (P(t) = 1 — e"*^*/"^*^) property as 

Pr(X^ > a + b) = Pr{X^ > a)Pr{X^ > b), a,b>0. (22) 



4 Analogue Algebra 

4-1 q- Analogues 

Results in Tsallis entropy are often efficiently expressed using g-analogues of 



elementary mathematical operations (jl6l: llTI). The g-logarithm is defined for 
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q > as 



Sq{p) \ p i 1 Q^j^erwise ^ ^ 

1-9 



We let the notation {v)_^_ mean v ii v > and otherwise. The inverse of the 
g-logarithm is 

exp,(^;) = (1 + (1 - q)v)t'' ■ 
Using these two functions, we can define an analogue to multiplication: 

x®qy = exp^(logg(x) + \ogq{y)) = + y^~'^ - 1)^ 

if x^~'^ + — 1 > and otherwise it is 0. It is associative, commutative and 
has 1 as its neutral element. Under this definition of ®g, we have the identities: 

expg(x + y) = exp^(x) ®q exp^(y) 

logq{x ®q y) — \ogq{x) + log^(y) (whenever the left-hand-side is defined). 



4-2 Analogues Based on Link Functions 



The deformed muhiphcation above is a special case of defining an algebraic 
structure using a hnk functions. The idea is that if we have a set Vl and an 
invertible function h from ^2 to a set with an algebraic structure of some kind, 
we can pull it back using h and define a corresponding structure on Q. 

Suppose that we have a commutative semiring i?, i.e. a set on which a mul- 
tiplication and an addition have been defined such that the addition is com- 
mutative, associative and has an identity, the multiplication is commutative, 
associative, has an identity element and distributes over the addition. It is 
also usually part of the assumption that • r = for all r e i?. 

If /i : f2 — > is bijective, then we can define addition and multiplication on VL 
by letting 

x®hy^h-\h{x)h{y)) (24) 

and 

x®hy^h-\h{x)+h{y)). (25) 

The additive and multiplicative identities are /i~^(0) and h~^{l). We could 
also write that h{x ®h u) = h{x)h{y) and h{x (Bh u) = h{x) + h{y). The 
g-analogues in the previous section arc defined by using the link function 
hq{x) = e^'^^i^^'^ to pull back the usual multiplication. If we want an addition 
such that expg(a;©q|/) = exp(a;) exp{y), then we would use the function hq{x) ~ 
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log(exp^(a;)) to define x®qy. The result is 

x®qy = X + y + (1 - q)xy. 



(26) 



We focus on pulling back one operation using a link function, e.g. using hq 
or we will use another link function, exp(logg(-)) to define ®q. We are really 
just defining semigroups for which a given link function becomes an extensive 
scale. We are either interested in adding data or multiplying probabilities. 



^.3 Memoryless Properties Based on Link Functions 



The characterizing memoryless property of a probability distribution can be 
expressed by providing a function h, namely h{t) = log({F(t))~^) or alterna- 
tively the same formula but with F — 1 — F. We can then express this as 

Pr{h{X) >a + b)^ Pr{h{X) > a)Pr{h{X) > h) (27) 

or 

Pr{X > h-\a + b)) = Pr{X > h-\a))Pr{X > h-\b)). (28) 
Setting xq — /i~^(a) and xi — h~^{b) the latter equation becomes 

Pr{X > xo ®h xi) = Pr{X > xo)Pr{X > xi). (29) 

If X satisfies a memoryless property based on a link function h, we can con- 
clude that for some A > and all x > 



Pr{h{X) > x) = e-^. (30) 

For the WeibuU distribution we have the simple function h{x) = x^. For 
GLLDi and GLLD2, we will consider special cases resulting from letting /3 = 
1. Letting ^ — n — a does not, as previously mentioned, result in any loss 
of generality. Then we have the link functions hi{x) = log(l + ^Y^"" and 
h2{x) = log(l + -)". Note that hi{x) tends pointwise to - when a 0+ and 
h2{x) tends pointwise to x when a — > cxo. Ghitany preferred to express the 
GLLDi and GLLD2 properties with n — m — 1. 
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5 Tsallis entropy 

5.1 Deriving Tsallis Entropy 



We know that to define Tsallis entropy we want the function V to satisfy 

V{piP2) = V{pi)®,V{p2). (31) 

Therefore, we would like to find a governing distribution that results in the 
link function that define ®q. We do that by using /i2 in the previous section 
that is the link function that we got from GLLD2 with f3 = 1. 

Letting g = l — ^forQ;>0 implies that a = and 

/i2(x) = log((l + (1 - g)x)T^) = log(exp^(x)) (32) 

which is the desired link function. This scheme defines a governing distribution, 
which leads to this link function for every g < 1. g = 1 is simply the classical 
case with identity link function, which corresponds to having an exponential 
governing distribution. 

For GLLD2 with P = l, n = a>0 and q = I - ^ 

Pr{X >x) = = = exp,_,(-x). (33) 

If g > 1, we can define a probability distribution on (0, by using the same 
formula 

Pr{X >x) = \— (34) 

expg(x) 

which immediately results in the desired link function log(expg) for g > 1. 
Thus, for all g, this link function comes from a distribution defined by [341 
however, for g > 1 it is defined on a finite interval, which depends on g. 

If we base our information measure V on [Ml where x = log^ py(^^^) ; the result 
is 

nP) = log,(-) = -log2_,(p) (35) 
^ p ^ 

To define the corresponding entropy, we need to calculate the expectation of 
V. Suppose that we have a random variable with M different outcomes that 
we are trying to model. If we have a distribution fi that assigns probability pi 
to outcome i, then its entropy is 

T.p^v{p.) = T.Pi-^ = = s^if^) (36) 
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where we have used that I]i=iPj = 1- 



5.2 Deformed Factorization 



In this section, we consider a different way of expressing generahzed memory- 
less properties. Equation [5U] can be rewritten on a form where the hnk function 
appears on the right hand side. Suppose that 

Pr{X >x) = e-^(^) = F{x). (37) 



We will define a multiplication ® such that for all x,y > 

F{x) ® F{y) = F{x + y) (38) 

which, with x = log(a) and y = log(6), is equivalent to 

F(log a) O F(log b) = F(log ab) . (39) 

Thus, we define a multiplication using the function inverse of F(log(-)) as link 
function. The g-multiplication defined in the section on g-analogues, is 
defined such that [38] is true when F = . Thus, the memoryless property of 

expg 

this special case of the generalized log-logistic can be written as 



Pr{X > a + b) = Pr{X > a) ®g Pr{X > b). 



(40) 



This is an alternative to the deformation of the inside addition, which takes 
place in the characterization 



Pr{X > Xo®gXi) = Pr{X > Xo)Pr{X > Xi 



(41) 



(8>q-factorization has been used (llTl : Il8l ) to formulate a Central Limit Theorem 
for non-extensive statistical mechanics and a q-Hammersley-Clifford Theorem 
(0). 



5.3 Generalized Statistical Theorems 



Many theorems like e.g. the Central Limit Theorem (1171 : Il8l ) have been gen- 
eralized to Tsallis Statistics. This possibility is not surprising. Given the per- 
spective of this article, we can apply the classical theorem to h{X) and pull 
back assumptions and conclusions. 
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6 General Results 



To avoid some unnecessary technical issues we will in this section assume that 
all governing cumulative distribution functions are strictly increasing on the 
set for which its value is in the open interval (0, 1). Our reasoning in previous 
sections has established the following theorems: 

Theorem 1 Suppose that h : Q ^ is bijective. If we define an order on 
Q by letting x < y for x,y E Q iff h{x) < h{y) in IR+ and a group operation 
X on Q by letting x Xf^y = h^^[h{x) + h{y)), then the resulting structure Qh 
is an ordered semigroup and 

h{xxhy) = h{x) + h{y), (42) 

or in other words, h is an extensive measurement scale for Qh.. 

The next theorem is our way of formulating one direction of Ghitany's main 
theorem. 

Theorem 2 (Ghitany) IfF is a cumulative distribution function on {—oo, oo), 
then there is a function h from Q = {x E \ F{x) G (0, 1)} to M+ such that 
Pr{h{X) > X + y) = Pr{h{X) > x)Pr{h{X) > y) for x,y > 0, which implies 
that there zs A > such that Pr{h{X) > x) = e~^^' and Pr{X > y) = e'^^^y'^ 
for y E Q. 

Suppose we have an interval f2 in R, a governing distribution F with survival 
function F and a link function h : Q ^ M.^ such that Pr{h{X) > x + y) = 
Pr{h{X) > x)Pr{h{X) > y), x,y > 0. The existence of h is guaranteed 
by Ghitany's Theorem. Theorem [1] provides us with an ordered semigroup 
structure on fl, which we denote by Q^- Denoting the group operation by Q)h, 
we can express the memoryless property of F by 

Pr{X > x®hy) = Pr{X > x)Pr{X > y). 

Theorem 3 Suppose that V : [0,1] Q is defined by V{-) = F^^{-) with 
definitions as above. Then V{piP2) = V{pi) (Bh ^fe) o.'fid furthermore, if 
{A, fi) is a probability space and we define an entropy by letting S{^) = E^{V) 
it is extensive with respect to (Bh- 

Theorem [3] is saying that a governing distribution provides a link function, an 
ordered semigroup structure with respect to which h is an extensive measure- 
ment scale and an entropy, which is extensive with respect to Qh- 

Theorem 4 We have also seen that if we let h{-) = -F(log(-)) and let 

Pi ®h-i P2 = h{h^^ {pijh^^ {p2)) , we can express the memoryless property of F 
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by 



Pr{X > x + y) = Pr{X > x) Pr{X > y). 



(43) 



7 A Rainfall Example 

We have in this article used rainfall modeling as an example. Hydrological 
modeling is a whole science in itself and we will here just give a concrete ex- 
ample with rainfall data from Spey(at Kinrara) in Scotland, previously studied 
in many papers including (j2ol ) where log-logistic models are apphed. The data 
is an annual maximum series in m?s~^ for the years 1952-1982. The series is: 



89.8, 109.1, 202.2, 146.3, 212.3 116.7 109.1, 80.7, 127.4, 138.8, 283.5, 85.6 
105.5, 118.0 387.8, 80.7, 165.7111.6, 134.4, 131.5, 102.0, 242.5, 214.8, 144.6, 
114.2, 98.3, 102.8, 104.3, 196.2, 143.7. 



To model such data it is common to combine data from hydrologicaly homoge- 
neous zones to find enough data to estimate parameters from. We will not try 
to estimate anything but just give an example of what a Tsallis scale trans- 
formation is doing. If we would let X be the difference between the annual 
maximum and 60, which is approximately half the median for the data, we 
would consider the functions log(H- (1 — g)X). Since e*^^"*?)^ ~ 1-|-(1 — g)a; 
if (1 — g)x is close to zero, we are not changing the scale by much as long 
as we are close to zero, while larger values are rescaled more substantially. If 
we apply this transformation with q = ^, the resulting sequence has as its 
smallest elements 2.1, 2.1, 2.3, 2.4 and the largest 4.4, 4.1, 3.9, 3.8. We have 
arrived at a more tempered scale. We do not, however, claim that it truly is 
an extensive measurement scale for the studied phenomenon. 



8 Remarks 

8. 1 Events at Different Levels 

Sometimes the word scale is used in a different sense then it has been used in 
this article, e.g. subatomic, atomic, molecular, cellular, organism, sociological, 
astronomical scale. An interesting property of the Tsallis entropy scale, 
which is based on the distribution with Pr(X > x) = q A-.^ , is that if a = n 



is a natural number and Pr{Xi > t) = then 

" X 
Pr{X > x) = l[Pr{Xi > -). 



(44) 



i=l 
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If {Xi} are independent and we let X = nminjXj, then X satisfies HH This 
situation could arise if X^ are components in a chain, which produces n times 
more units of something, e.g. energy or money, than a single component but 
which also has the property that the entire chain stops if any component 
breaks. A similar situation would occur if we need equal amounts of n differ- 
ent substances to make a certain product. The residues are not observable if 
we only see the resulting product. This is typical when we observe a complex 
multi-level system at a specific level. If we would look at this from a super- 
statistics point of view where we work with Gamma distributions, we note 
that a Gamma distribution with integer shape parameter k is the distribution 
of a sum of k exponentially distributed random variables. It is also true that if 
we add Gamma distributed random variables with the same scale parameter, 
the result is Gamma distributed with the same scale but with a shape that is 
the sum of the shapes of the terms. These properties are suitable if we want 
to model observations which are really combinations of events at a smaller 
unobserved level. 



8.2 Radial Basis Functions 



A Radial Basis Function is a function h : R_|_ R+. The idea is to transform 
the euclidean distance by considering h{\x — y\) instead of \x — y\, i.e. it 
is about changing the scale. It is often used in interpolation theory when 
we want to approximate a function f : —>■ from a finite number of 
(possibly approximate) function values f{xi). The rescaling of the norm can 
be expressed as using a feature map into a Reproducing Kernel Hilbert Space 
(RKHS) with kernel k{x,y) = h{\x — y\). A common choice is the Gaussian 
h{r) = 1'^ . It focuses the influence of a function value much more towards 
the immediate surroundings of the point, while it almost extinguishes its long 
range influence. RKHS are used in many application areas including Chemical 
Physics where they are used to construct multidimensional molecular potential 
energy surfaces 02 ll ). 



9 Summary 



We have defined an extensive measurement scale for a random variable X as 
being a function h which makes the distribution of h(X^ memoryless. An alter- 
native to applying /i to x is to deform the elementary mathematical operations 
that we use to analyze X. This includes deforming the entropy. By choosing 
different governing distributions, a general class of entropies and information 
measures arise. We show that Tsallis entropy can be derived in this way from 
a class of generalized log-logistic distributions. 
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